37 research outputs found

    Real-Time Streaming Multi-Pattern Search for Constant Alphabet

    Get PDF
    In the streaming multi-pattern search problem, which is also known as the streaming dictionary matching problem, a set D={P_1,P_2, . . . ,P_d} of d patterns (strings over an alphabet Sigma), called the dictionary, is given to be preprocessed. Then, a text T arrives one character at a time and the goal is to report, before the next character arrives, the longest pattern in the dictionary that is a current suffix of T. We prove that for a constant size alphabet, there exists a randomized Monte-Carlo algorithm for the streaming dictionary matching problem that takes constant time per character and uses O(d log m) words of space, where m is the length of the longest pattern in the dictionary. In the case where the alphabet size is not constant, we introduce two new randomized Monte-Carlo algorithms with the following complexities: * O(log log |Sigma|) time per character in the worst case and O(d log m) words of space. * O(1/epsilon) time per character in the worst case and O(d |Sigma|^epsilon log m/epsilon) words of space for any 0<epsilon<= 1. These results improve upon the algorithm of [Clifford et al., ESA\u2715] which uses O(d log m) words of space and takes O(log log (m+d)) time per character

    Streaming Pattern Matching with d Wildcards

    Get PDF
    In the pattern matching with d wildcards problem we are given a text T of length n and a pattern P of length m that contains d wildcard characters, each denoted by a special symbol \u27?\u27. A wildcard character matches any other character. The goal is to establish for each m-length substring of T whether it matches P. In the streaming model variant of the pattern matching with d wildcards problem the text T arrives one character at a time and the goal is to report, before the next character arrives, if the last m characters match P while using only o(m) words of space. In this paper we introduce two new algorithms for the d wildcard pattern matching problem in the streaming model. The first is a randomized Monte Carlo algorithm that is parameterized by a constant 0<=delta<=1. This algorithm uses ~O(d^{1-delta}) amortized time per character and ~O(d^{1+delta}) words of space. The second algorithm, which is used as a black box in the first algorithm, is a randomized Monte Carlo algorithm which uses O(d+log m) worst-case time per character and O(d log m) words of space

    Locally Consistent Parsing for Text Indexing in Small Space

    Full text link
    We consider two closely related problems of text indexing in a sub-linear working space. The first problem is the Sparse Suffix Tree (SST) construction of a set of suffixes BB using only O(B)O(|B|) words of space. The second problem is the Longest Common Extension (LCE) problem, where for some parameter 1τn1\le\tau\le n, the goal is to construct a data structure that uses O(nτ)O(\frac {n}{\tau}) words of space and can compute the longest common prefix length of any pair of suffixes. We show how to use ideas based on the Locally Consistent Parsing technique, that was introduced by Sahinalp and Vishkin [STOC '94], in some non-trivial ways in order to improve the known results for the above problems. We introduce new Las-Vegas and deterministic algorithms for both problems. We introduce the first Las-Vegas SST construction algorithm that takes O(n)O(n) time. This is an improvement over the last result of Gawrychowski and Kociumaka [SODA '17] who obtained O(n)O(n) time for Monte-Carlo algorithm, and O(nlogB)O(n\sqrt{\log |B|}) time for Las-Vegas algorithm. In addition, we introduce a randomized Las-Vegas construction for an LCE data structure that can be constructed in linear time and answers queries in O(τ)O(\tau) time. For the deterministic algorithms, we introduce an SST construction algorithm that takes O(nlognB)O(n\log \frac{n}{|B|}) time (for B=Ω(logn)|B|=\Omega(\log n)). This is the first almost linear time, O(npolylogn)O(n\cdot poly\log{n}), deterministic SST construction algorithm, where all previous algorithms take at least Ω(min{nB,n2B})\Omega\left(\min\{n|B|,\frac{n^2}{|B|}\}\right) time. For the LCE problem, we introduce a data structure that answers LCE queries in O(τlogn)O(\tau\sqrt{\log^*n}) time, with O(nlogτ)O(n\log\tau) construction time (for τ=O(nlogn)\tau=O(\frac{n}{\log n})). This data structure improves both query time and construction time upon the results of Tanimura et al. [CPM '16].Comment: Extended abstract to appear is SODA 202

    Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams

    Get PDF
    Recently, there has been a growing focus in solving approximate pattern matching problems in the streaming model. Of particular interest are the pattern matching with k-mismatches (KMM) problem and the pattern matching with w-wildcards (PMWC) problem. Motivated by reductions from these problems in the streaming model to the dictionary matching problem, this paper focuses on designing algorithms for the dictionary matching problem in the multi-stream model where there are several independent streams of data (as opposed to just one in the streaming model), and the memory complexity of an algorithm is expressed using two quantities: (1) a read-only shared memory storage area which is shared among all the streams, and (2) local stream memory that each stream stores separately. In the dictionary matching problem in the multi-stream model the goal is to preprocess a dictionary D={P_1,P_2,...,P_d} of d=|D| patterns (strings with maximum length m over alphabet Sigma) into a data structure stored in shared memory, so that given multiple independent streaming texts (where characters arrive one at a time) the algorithm reports occurrences of patterns from D in each one of the texts as soon as they appear. We design two efficient algorithms for the dictionary matching problem in the multi-stream model. The first algorithm works when all the patterns in D have the same length m and costs O(d log m) words in shared memory, O(log m log d) words in stream memory, and O(log m) time per character. The second algorithm works for general D, but the time cost per character becomes O(log m+log d log log d). We also demonstrate the usefulness of our first algorithm in solving both the KMM problem and PMWC problem in the streaming model. In particular, we obtain the first almost optimal (up to poly-log factors) algorithm for the PMWC problem in the streaming model. We also design a new algorithm for the KMM problem in the streaming model that, up to poly-log factors, has the same bounds as the most recent results that use different techniques. Moreover, for most inputs, our algorithm for KMM is significantly faster on average

    The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time

    Get PDF
    We revisit the kk-mismatch problem in the streaming model on a pattern of length mm and a streaming text of length nn, both over a size-σ\sigma alphabet. The current state-of-the-art algorithm for the streaming kk-mismatch problem, by Clifford et al. [SODA 2019], uses O~(k)\tilde O(k) space and O~(k)\tilde O\big(\sqrt k\big) worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is O~(nk)\tilde O(n\sqrt k), and the fastest known offline algorithm, which costs O~(n+min(nkm,σn))\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma n\big)\big) time. Moreover, it is not known whether improvements over the O~(nk)\tilde O(n\sqrt k) total time are possible when using more than O(k)O(k) space. We address these gaps by designing a randomized streaming algorithm for the kk-mismatch problem that, given an integer parameter ksmk\le s \le m, uses O~(s)\tilde O(s) space and costs O~(n+min(nk2m,nks,σnms))\tilde O\big(n+\min\big(\frac {nk^2}m,\frac{nk}{\sqrt s},\frac{\sigma nm}s\big)\big) total time. For s=ms=m, the total runtime becomes O~(n+min(nkm,σn))\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma n\big)\big), which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still O~(k)\tilde O\big(\sqrt k\big).Comment: Extended abstract to appear in CPM 202

    Improved Circular k-Mismatch Sketches

    Get PDF
    The shift distance sh(S1,S2)\mathsf{sh}(S_1,S_2) between two strings S1S_1 and S2S_2 of the same length is defined as the minimum Hamming distance between S1S_1 and any rotation (cyclic shift) of S2S_2. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings S1S_1 and S2S_2 of length nn are given to two identical players (encoders), who independently compute sketches (summaries) sk(S1)\mathtt{sk}(S_1) and sk(S2)\mathtt{sk}(S_2), respectively, so that upon receiving the two sketches, a third player (decoder) is able to compute (or approximate) sh(S1,S2)\mathsf{sh}(S_1,S_2) with high probability. This paper primarily focuses on the more general kk-mismatch version of the problem, where the decoder is allowed to declare a failure if sh(S1,S2)>k\mathsf{sh}(S_1,S_2)>k, where kk is a parameter known to all parties. Andoni et al. (STOC'13) introduced exact circular kk-mismatch sketches of size O~(k+D(n))\widetilde{O}(k+D(n)), where D(n)D(n) is the number of divisors of nn. Andoni et al. also showed that their sketch size is optimal in the class of linear homomorphic sketches. We circumvent this lower bound by designing a (non-linear) exact circular kk-mismatch sketch of size O~(k)\widetilde{O}(k); this size matches communication-complexity lower bounds. We also design (1±ε)(1\pm \varepsilon)-approximate circular kk-mismatch sketch of size O~(min(ε2k,ε1.5n))\widetilde{O}(\min(\varepsilon^{-2}\sqrt{k}, \varepsilon^{-1.5}\sqrt{n})), which improves upon an O~(ε2n)\widetilde{O}(\varepsilon^{-2}\sqrt{n})-size sketch of Crouch and McGregor (APPROX'11)

    Interpreting the results of chemical stone analysis in the era of modern stone analysis techniques

    Get PDF
    INTRODUCTION AND OBJECTIVE: Stone analysis should be performed in all first-time stone formers. The preferred analytical procedures are Fourier-transform infrared spectroscopy (FT-IR) or X-ray diffraction (XRD). However, due to limited resources, chemical analysis (CA) is still in use throughout the world. The aim of the study was to compare FT-IR and CA in well matched stone specimens and characterize the pros and cons of CA. METHODS: In a prospective bi-center study, urinary stones were retrieved from 60 consecutive endoscopic procedures. In order to assure that identical stone samples were sent for analyses, the samples were analyzed initially by micro-computed tomography to assess uniformity of each specimen before submitted for FTIR and CA. RESULTS: Overall, the results of CA did not match with the FTIR results in 56 % of the cases. In 16 % of the cases CA missed the major stone component and in 40 % the minor stone component. 37 of the 60 specimens contained CaOx as major component by FTIR, and CA reported major CaOx in 47/60, resulting in high sensitivity, but very poor specificity. CA was relatively accurate for UA and cystine. CA missed struvite and calcium phosphate as a major component in all cases. In mixed stones the sensitivity of CA for the minor component was poor, generally less than 50 %. CONCLUSIONS: Urinary stone analysis using CA provides only limited data that should be interpreted carefully. Urinary stone analysis using CA is likely to result in clinically significant errors in its assessment of stone composition. Although the monetary costs of CA are relatively modest, this method does not provide the level of analytical specificity required for proper management of patients with metabolic stones

    A clinical evaluation of an ex vivo organ culture system to predict patient response to cancer therapy

    Get PDF
    IntroductionEx vivo organ cultures (EVOC) were recently optimized to sustain cancer tissue for 5 days with its complete microenvironment. We examined the ability of an EVOC platform to predict patient response to cancer therapy.MethodsA multicenter, prospective, single-arm observational trial. Samples were obtained from patients with newly diagnosed bladder cancer who underwent transurethral resection of bladder tumor and from core needle biopsies of patients with metastatic cancer. The tumors were cut into 250 μM slices and cultured within 24 h, then incubated for 96 h with vehicle or intended to treat drug. The cultures were then fixed and stained to analyze their morphology and cell viability. Each EVOC was given a score based on cell viability, level of damage, and Ki67 proliferation, and the scores were correlated with the patients’ clinical response assessed by pathology or Response Evaluation Criteria in Solid Tumors (RECIST).ResultsThe cancer tissue and microenvironment, including endothelial and immune cells, were preserved at high viability with continued cell division for 5 days, demonstrating active cell signaling dynamics. A total of 34 cancer samples were tested by the platform and were correlated with clinical results. A higher EVOC score was correlated with better clinical response. The EVOC system showed a predictive specificity of 77.7% (7/9, 95% CI 0.4–0.97) and a sensitivity of 96% (24/25, 95% CI 0.80–0.99).ConclusionEVOC cultured for 5 days showed high sensitivity and specificity for predicting clinical response to therapy among patients with muscle-invasive bladder cancer and other solid tumors

    Emerging roles of hnRNPA1 inmodulating malignanttransformation

    Get PDF
    Heterogeneous nuclear ribonucleoproteins (hnRNPs) are RNA-binding proteins associated with complex and diverse biological processes such as processing of heterogeneous nuclear RNAs (hnRNAs) into mature mRNAs, RNA splicing, transactivation of gene expression, and modulation of protein translation. hnRNPA1 is the most abundant and ubiquitously expressed member of this protein family and has been shown to be involved in multiple molecular events driving malignant transformation. In addition to selective mRNA splicing events promoting expression of specific protein variants, hnRNPA1 regulates the gene expression and translation of several key players associated with tumorigenesis and cancer progression. Here, we will summarize our current knowledge of the involvement of hnRNPA1 in cancer, including its roles in regulating cell proliferation, invasiveness, metabolism, adaptation to stress and immortalization
    corecore